Optimization

The regexMatch code should work for all inputs, but for large inputs, the computed derivative may be an extremely large and unnecessarily complex regular expression. Let’s fix that.

Our approach will be to implement a function optimize that takes in a Regex and uses equivalence rules to transform the regular expression into an equivalent, but hopefully smaller, expression.

To Do

  1. To take full advantage of optimization, our implementation of deriv should produce optimized expressions. We will need to modify the code in src/Evaluation.hs:

    1. Rename the deriv function to deriv', but do not rewrite any calls to deriv.
    2. Now, define deriv to do optimization (you will need to import optimize):
      deriv :: Char -> Regex -> Regex
      deriv c r = optimize (deriv' c (optimize r))
    1. You should now be able to run the matching tests and get the same results as before.
  2. In src/Optimization.hs, add additional cases/code to the optimize function so that it returns a “more optimized” regular expression, by taking advantage of the regular-expression equalities at the bottom of this page. More specifically,

    • the optimize function should use pattern-matching to detect whether the input to the function has the form of the left-hand side of an equality at the bottom of this page, and instead return the corresponding right-hand side. (The provided code should be kept as a final “catch-all” case, so that the optimize function behaves as before when the inputs do not match any of the left-hand sides.)
    • You will want to use recursion to optimize sub-expressions.
    • Use the test suite to verify that you haven’t broken anything. It is easy to make small mistakes or typos; you will avoid a lot of headaches if you re-run the tests after every small change!

Regular-expression equalities

            ∅r  =  ∅
            r∅  =  ∅
            εr  =  r
            rε  =  r

            ¬∅ | r  =  ¬∅
            r | ¬∅  = ¬∅
            ∅ | r   =  r
            r | ∅   =  r

            ∅ & r   =  ∅
            r & ∅   =  ∅
            ¬∅ & r  =  r
            r & ¬∅  =  r
                                    
            (r*)*  = r*
            ε*  =  ε
            ∅*  =  ε
            
            ¬(¬r)  =  r 

            r | r  = r
            r & r  = r

            r | s  = s | r
            r & s  = s & r

         (r s ... ) (t u ...) = (r s ... t u ...)
                r (s t u ...) = (r s t u ...)
                (r s t ...) u = (r s t ... u)

        (r|s|...) | (t|u|...) = (r|s|...|t|u|...)
              r | (s|t|u|...) = (r|s|t|u|...)
              (r|s|t|...) | u = (r|s|t|...|u)
        
        (r&s&...) & (t&u&...) = (r&s&...&t&u&...)  
              r & (s&t&u&...) = (r&s&t&u&...)
              (r&s&t&...) & u = (r&s&t&...&u)